Advancing Linguistic Features and Insights by Label-informed Feature Grouping: An Exploration in the Context of Native Language Identification
نویسندگان
چکیده
We propose a hierarchical clustering approach designed to group linguistic features for supervised machine learning that is inspired by variationist linguistics. The method makes it possible to abstract away from the individual feature occurrences by grouping features together that behave alike with respect to the target class, thus providing a new, more general perspective on the data. On the one hand, it reduces data sparsity, leading to quantitative performance gains. On the other, it supports the formation and evaluation of hypotheses about individual choices of linguistic structures. We explore the method using features based on verb subcategorization information and evaluate the approach in the context of the Native Language Identification (NLI) task.
منابع مشابه
Looking at Globalization of English in the Context of Internationalism
The present study is an attempt to provide a current synopsis of World Englishes within globalized communities, as well as theoretical and applied feasibility of global linguistic features of English as an International Language (EIL). To do so, first, three main reactions against the spread of English by scholars around the world are discussed. Then, the possibility of describing and teaching ...
متن کاملCode-Copying in the Balochi Language of Sistan
This empirical study deals with language contact phenomena in Sistan. Code-copying is viewed as a strategy of linguistic behavior when a dominated language acquires new elements in lexicon, phonology, morphology, syntax, pragmatic organization, etc., which can be interpreted as copies of a dominating language. In this framework Persian is regarded as the model code which provides elements for b...
متن کاملConstitutive Features of the Russian Political Discourse in Ecolinguistic Aspect
The article offers a comparative description of typological mechanisms used in political communicative practice and methods of verbal explication of its axiological and symbolic constituents determining universal mental features of individual/collective consciousness. The research position based on a systemic multilevel analysis of the component structure of discourse facilitates the identifica...
متن کاملGender-preferential Linguistic Elements in Applied Linguistics Research Papers: Partial Evaluation of a Model of Gendered Language
This article intended to investigate whether the gender-preferential linguistic elements found by Argomon, Koppel, Fine and Shimoni (2003) show the same gender-linked frequencies in applied linguistics research papers written by non-native speakers of English. In so doing, a sample of 32 articles from different journals was collected and the proportion of the targeted features to the whole numb...
متن کاملThe Discursive Construction of “Native” and “Non-Native” Speaker English Teacher Identities in Japan: A Linguistic Ethnographic Investigation
Recent poststructuralist theories of identity posit identities as being discursively constructed in interactions with society, institutions, and individuals. This study used a Linguistic Ethnographic framework to investigate the discursive identity construction of two English teachers, one ‘non-native’ English speaker, and one ‘native’ English speaker, teaching English in a tertiary institution...
متن کامل